Also use this to get all the pythons:
# install everything with Python 2 and 3.
conda create -n py36 python=3.6 anaconda
conda create -n py27 python=2.7 anaconda
# register py27 kernel - no need for "source" on windows
source activate py27
ipython kernel install
# same for py36, and install juptyerhub in the py36 env
source activate py36
ipython kernel install
pip install jupyterhub
Install necessary packages with:
pip install insert_package_name_heresudo if you're on a Mac.conda install insert_package_name_here if you run into issues with pipconda install -c conda-forge insert_package_name_here is also an option for certain packages.
You're probably going to want the following packages (though some may already be installed via Anaconda):
jupyter contrib nbextension install --userjt -t grade3 -fs 12 -tfs 12 -nfs 115 -cellw 88% -Tjt -rpython -m nbopen.install_xdgpython -m nbopen.install_win./osx-install.shpython setup.py installOpen Jupyter notebook from terminal or cmd:
jupyter lab or jupyter notebookhttp://localhost:8888/lab or http://localhost:8888/tree respectively.
jupyter nbconvert --to html_toc FILENAME.ipynbc = get_config()
c.Exporter.preprocessors = ['pre_pymarkdown.PyMarkdownPreprocessor']
Double-click on this cell to see how everything was written!
Headings are made with preceding "#" signs. <h1> is #, <h2> is ##, etc.
Force new blank lines with <br> .
Italics are made by surrounding a word or phrase with asterisks, or with underscores, like so.
Bold words are made by surrounding a word or phrase with 2 asterisks on each end.
You can make a phrase both bold and italic by combining the above!
Put a ">" before a line to turn it into a blockquote.
Unhighlighted code goes between backticks: this is code
And you can define blocks of code by sandwiching them between 3 backticks on either end (you can even define syntax highlighting!)
x = [1, 2, 3]
for i in x:
print(i)
Hyperlinks go in square brackets, with the link itself going in parentheses immediately after (no whitespace allowed between neighboring brackets)!
Images are set up just like hyperlinks, but with an exclamation point in front. The writing in square brackets serves as the alt-text for the image.
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/HW29067qVWk" frameborder="0" allowfullscreen></iframe>
# %%HTML
# <iframe src="https://fiddle.jshell.net/rahonavis75/ed4486f9/show/" width="800" height="500">
Sandwich your LaTeX between two dollar signs.
$$
\begin{equation*}
\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)
\end{equation*}
$$
See all commands.
lsmagic
See list of current variables in global scope. Can also specify a data type thereafter.
%who
And run terminal commands directly with "!"
!pip list
SHIFT+TAB will bring up help for your current functionCTRL+Enter executes the current cell, keeping your focus on itCTRL+SHIFT+Enter executes the current cell, and moves you down to the next cellALT+Enter executes the current cell AND makes a new one belowESC brings you to command mode, where you can do a number of things:A makes a new cell aboveB makes a new cell belowD D (that's D twice) deletes a cellX cuts selected cellsC copies the cellsV pastes the cellsY turns the cell into codeM turns the cell into MarkdownCTRL+SHIFT+F brings up the command palette, with all available commandsGiant pandas tutorial and attendant notes available at the links.
Allow plots in the notebook itself, and enable some helpful functions.
%reset -f
%matplotlib inline
%config InlineBackend.figure_format = 'retina' # High-res graphs (rendered irrelevant by svg option below)
%config InlineBackend.print_figure_kwargs = {'bbox_inches':'tight'} # No extra white space
%config InlineBackend.figure_format = 'svg' # 'png' is default
import warnings
warnings.filterwarnings('ignore') # Because we are adults
Import example data.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
data = sns.load_dataset('tips')
data.head() # show first n entries (default is 5)
Change default graph appearance to something you like. See here for full list of available built-in styles.
sns.set_style("ticks") # e.g., ggplot, whitegrid, etc.
Plot histograms of tips grouped by sex side by side. Make sure both have the same x and y limits.
data['tip'].hist(by=data['sex'], sharex=True, sharey=True)
sns.despine() # Remove top and right side of box
plt.show() # Somewhat redundant in this context, but suppresses annoying text output.
Plot overlaid histograms.
grouped_by_sex = data.groupby('sex')
# You can also add several arguments below like bins=20, or normed=True
figure, axes = grouped_by_sex['tip'].plot(kind='hist', normed=False, alpha=.5, legend=True)
# Re-label legend entries, move legend to right-middle
axes.legend(['Men', 'Women'], loc=(0.75, 0.5))
sns.despine()
plt.show()
Show summary stats for the sexes.
grouped_by_sex['tip'].describe()
Get a subset of the data — here the tips given on Sunday at dinner time.
sunday_dinner_tips = data.tip[(data.day=="Sun") & (data.time=="Dinner")]
Perform an ANOVA, using R-style syntax.
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = 'tip ~ sex * smoker'
lm = ols(model, data=data).fit()
table = sm.stats.anova_lm(lm, typ=2)
display(table)
Make the table prettier and more intelligible.
from prettypandas import PrettyPandas
def color_significant_green(val, alpha=0.05):
if val < alpha: color = 'green'
else: color = 'black'
return 'color: %s' % color
def bold_significant(val, alpha=0.05):
if val < alpha: font_weight = 'bold'
else: font_weight = 'normal'
return 'font-weight: %s' % font_weight
t = PrettyPandas(table)
(
t.applymap(color_significant_green, alpha=.05, subset=['PR(>F)']) # alpha is optional here, of course
.applymap(bold_significant, alpha=.05, subset=['PR(>F)'])
.format("{:.3f}", subset=['sum_sq', 'F', 'PR(>F)']) # show only 3 decimal places
)
from numpy import sqrt
from scipy.stats import ttest_ind
def cohens_d(t, n):
return 2*t / sqrt(n - 2)
# Set up empty results table
columns = ['n', 't', 'p', 'd']
index = []
results = pd.DataFrame(index=index, columns=columns)
# Get data for t-test
male_tips = data[data['sex']=='Male']['tip']
female_tips = data[data['sex']=='Female']['tip']
# Perform t-test and surrounding calculations
n = male_tips.count() + female_tips.count()
t, p = ttest_ind(male_tips, female_tips)
d = cohens_d(t, n)
# Add data to table
comparison = 'Male vs. Female'
results.loc[comparison] = [n, t, p, d]
# Output pretty table
r = PrettyPandas(results)
(
r.applymap(color_significant_green, subset=['p'])
.applymap(bold_significant, subset=['p'])
.format("{:.3f}", subset=['t', 'p', 'd'])
)
Requires development version of statsmodels package, available here.
pip install git+insert_link_hereimport pandas as pd
import numpy as np
import statsmodels
from statsmodels.stats.anova import AnovaRM
statsmodels.__version__
Create simulated reaction time data for 2 levels of an independent variable.
N = 20
P = [1,2]
values = [998,511]
sub_id = [i+1 for i in range(N)]*len(P)
mus = np.concatenate([np.repeat(value, N) for value in values]).tolist()
rt = np.random.normal(mus, scale=112.0, size=N*len(P)).tolist()
iv = np.concatenate([np.array([p]*N) for p in P]).tolist()
df = pd.DataFrame({'id': sub_id, 'rt': rt, 'iv':iv})
Do the repeated measures ANOVA.
aovrm = AnovaRM(df, depvar='rt', subject='id', within=['iv'])
fit = aovrm.fit()
fit.summary()
Plot simple line graph with sample data.
line_data = range(1,10)
plt.figure()
plt.title("Example Graph", size="xx-large") # can also feed font point size, like 36
plt.xlabel("X-Axis Label", size="x-large")
plt.ylabel("Y-Axis Label", size="x-large")
plt.xlim(0,10)
plt.ylim(0,10)
plt.plot(line_data, 'b*-', markersize=10, linewidth=3, label='Sample Data') # b*- means blue star marker with line
plt.tick_params(axis="both", which="major", labelsize=14)
plt.legend(loc=(0.25, 0.75), scatterpoints=1)
plt.show()
Plot Anscombe's quartet
import seaborn as sns
sns.set(style="ticks")
# Load the example dataset for Anscombe's quartet
anscombe = sns.load_dataset("anscombe")
# Show the results of a linear regression within each dataset
# Semi-colon suppresses the non-graph output
ax = sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=anscombe,
col_wrap=2, ci=None, palette="muted", size=4,
scatter_kws={"s": 50, "alpha": 1});
# Change axis labels
ax.set(xlabel='X', ylabel='Y');
Naturally, this defaults to showing a 95% confidence interval.
ax = sns.barplot(x="day", y="total_bill", data=data)
Plot violin plot with overlaid beeswarm plot.
fig, ax = plt.subplots()
# Output to the size of A4 paper
fig.set_size_inches(11.7, 8.27)
# Overlay a swarmplot on top of a violinplot
ax = sns.violinplot(x="day", y="total_bill", data=data, inner=None)
ax = sns.swarmplot(x="day", y="total_bill", data=data, color="white")
def set_titles(thisPlot, titleList, fontSize):
for ax, title in zip(thisPlot.axes.flat, titleList):
ax.set_title(title, fontsize=fontSize)
def set_labels(thisPlot, xLabel, yLabel, fontSize):
thisPlot.set_xlabels(xLabel, fontsize=fontSize)
thisPlot.set_ylabels(yLabel, fontsize=fontSize)
def set_xtick_labels(thisPlot, tickList, fontSize):
thisPlot.set_xticklabels(tickList, fontsize=fontSize)
def set_legend(thisPlot, legendEntries, fontSize):
# find where last graph is so we can put the legend there
maxIndex = max(thisPlot.axes.shape) - 1
# format the legend, placing it outside the axes
thisPlot.axes[0][maxIndex].legend(bbox_to_anchor=(1.05, 1), loc=2,
fontsize=fontSize, borderaxespad=0.)
legend = thisPlot.axes[0][maxIndex].get_legend()
labels = legend.get_texts()
for i, thisLabel in enumerate(labels):
labels[i].set_text(legendEntries[i])
# Make plots -- many of these arguments are optional
barPlot = sns.factorplot(x="day", y="total_bill", hue="sex",
col="time", kind="bar", data=data,
size=5, aspect=1, legend=False)
beeswarmPlot = sns.factorplot(x="day", y="total_bill", hue="sex",
col="time", kind="swarm", dodge=True,
data=data, size=5, aspect=1, legend=False)
# Format them nicely!
# Axis labels
xLabel = ""# "Day"
yLabel = "Total Bill"
set_labels(barPlot, xLabel, yLabel, 20)
set_labels(beeswarmPlot, xLabel, yLabel, 20)
# Titles
title_list = ["Lunch", "Dinner"]
titles = [x.title() for x in title_list] # ["Bimodal", "Normal", "Skewed"]
set_titles(barPlot, titles, 30)
set_titles(beeswarmPlot, titles, 30)
# X axis tick labels or category labels
x_tick_labels = ["Thursday", "Friday", "Saturday", "Sunday"]
set_xtick_labels(barPlot, x_tick_labels, 15)
set_xtick_labels(beeswarmPlot, x_tick_labels, 15)
# Change legends
legendEntries = ["Male", "Female"]
set_legend(barPlot, legendEntries, 15)
set_legend(beeswarmPlot, legendEntries, 15)
# Save plots
# barPlot.savefig("barPlot.svg") # can also use other extensions, like .png
# beeswarmPlot.savefig("beePlot.svg")
Made using bokeh. See here for a great tutorial, and here for the attendant notebook. Code below adapted from linked code to our current dataset.
from bokeh.plotting import figure, output_notebook, show
this_plot= figure(width=600, height=600)
this_plot.circle(x=data['total_bill'], y=data['tip'], size=10, alpha=0.7)
output_notebook() # to output inline
show(this_plot)
Make better, more interactive plot. Let's plot a scatterplot of tip amount vs. total bill, separately for men and women.
from bokeh.plotting import figure, output_notebook, show, ColumnDataSource
import bokeh.models.tools as tools
# Get relevant subsets of data
male_data = data[data['sex'] == 'Male']
female_data = data[data['sex'] == 'Female']
# Convert to format bokeh understands
source_male = ColumnDataSource(male_data)
source_female = ColumnDataSource(female_data)
# Set up figure
this_plot = figure(width=600, height=600)
this_plot.circle(source=source_male, x='total_bill', y='tip', color='teal',
size=10, alpha=0.7, legend='Men')
this_plot.circle(source=source_female, x='total_bill', y='tip', color='darkorange',
size=10, alpha=0.7, legend='Women')
# Set axis labels
this_plot.xaxis.axis_label = "Total Bill"
this_plot.yaxis.axis_label = "Tip Amount"
# Show information when hovering the mouse over datapoints
this_plot.add_tools(tools.HoverTool(tooltips=[('Day', '@day')])) # use @ to choose feature from dataset
# Hide all circles of a given category when clicked in legend
this_plot.legend.click_policy = 'hide'
output_notebook()
show(this_plot)
import holoviews as hv
hv.extension('bokeh', 'matplotlib')
ds = hv.Dataset(data, kdims=["sex", "smoker", "total_bill"],
vdims=["time", "size", "day", "tip"])
%%output backend='bokeh'
%%output size=200
%%opts Scatter [tools=['hover']] (size=8 alpha=0.5)
kdims=["tip"]
vdims=["total_bill", "day", "time", "size"] # include "smoker" if you don't want it as drop-down choice
# Scatter plot with hover tool that includes all the things
scatter = ds.to(hv.Scatter, kdims, vdims).overlay('sex')
scatter
from pivottablejs import pivot_ui
pivot_ui(data)
import matplotlib.pyplot as plt
from ipywidgets import *
from numpy import pi, arange, sin
t = arange(0, 1.0, 0.01)
def pltsin(f):
plt.plot(t, sin(2*pi*t*f))
plt.show()
interact(pltsin, f=(1,10,0.1))
Plotly is another package for producing really nice and interactive graphs, but it requires signing up for an account to initialize it. After initialization you can use it online by default (which means all of your graphs get saved to the cloud for everyone to see forever) or you can use it offline (as demoed below). Examples taken or modified from here.
import plotly
# plotly.tools.set_credentials_file(username='XXX', api_key='XXX') # initialize with your credentials -- only need to do once ever.
from plotly.graph_objs import Scatter, Layout
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot({
"data": [Scatter(x=[1, 2, 3, 4], y=[4, 3, 2, 1])],
"layout": Layout(title="hello world")
})
When I first tried using plotly I sometimes got "IOPub data rate exceeded" errors. Here's how you fix that:
jupyter notebook --generate-config to generate a clean configuration file with all parameters commented outc.NotebookApp.iopub_data_rate_limit and c.NotebookApp.iopub_msg_rate_limit to be some absurdly large numbersimport plotly.offline as py
import plotly.figure_factory as ff
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")
table = ff.create_table(df)
py.iplot(table, filename='plotly\table1')
import plotly.offline as py
from plotly.graph_objs import *
data = [Bar(x=df.School,
y=df.Gap)]
py.iplot(data)
trace_women = Bar(x=df.School,
y=df.Women,
name='Women',
marker=dict(color='#ffcdd2'))
trace_men = Bar(x=df.School,
y=df.Men,
name='Men',
marker=dict(color='#A2D5F2'))
trace_gap = Bar(x=df.School,
y=df.Gap,
name='Gap',
marker=dict(color='#59606D'))
data = [trace_women, trace_men, trace_gap]
layout = Layout(title="Average Earnings for Graduates",
xaxis=dict(title='School'),
yaxis=dict(title='Salary (in thousands)'))
fig = Figure(data=data, layout=layout)
py.iplot(fig)
data = [dict(
visible = False,
line=dict(color='00CED1', width=6),
name = '𝜈 = '+str(step),
x = np.arange(0,10,0.01),
y = np.sin(step*np.arange(0,10,0.01))) for step in np.arange(0,5,0.1)]
data[10]['visible'] = True
steps = []
for i in range(len(data)):
step = dict(
method = 'restyle',
args = ['visible', [False] * len(data)],
)
step['args'][1][i] = True # Toggle i'th trace to "visible"
steps.append(step)
sliders = [dict(
active = 10,
currentvalue = {"prefix": "Frequency: "},
pad = {"t": 50},
steps = steps
)]
layout = dict(sliders=sliders)
fig = dict(data=data, layout=layout)
py.iplot(fig)
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)
r = 2 + np.sin(7 * sGrid + 5 * tGrid) # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid) # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid) # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid) # z = r*cos(t)
surface = Surface(x=x, y=y, z=z)
data = Data([surface])
layout = Layout(
title='Parametric Plot',
scene=Scene(
xaxis=XAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
yaxis=YAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
zaxis=ZAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
)
)
)
fig = Figure(data=data, layout=layout)
py.iplot(fig)
Use set_trace() where you want the debugger to start.
'n' moves onto the next line
'c' continues execution of the script
from IPython.core.debugger import set_trace
def increment_value(a):
a += 1
set_trace()
print(a)
increment_value(3)
import inspect
import numpy as np
print(inspect.getsource(np))
inspect.getfile(np)
If you want to start digging deeper into Python, you can learn some cool things here, and here, and here.
That said, here is my favorite random snippet of python code ever. You can swap variable values without needing any temporary variables via tuple unpacking.
a = "A"
b = "B"
# Swap!
a, b = b, a
print("a = " + a)
print("b = " + b)
And extended unpacking is interesting to wrap your head around (Python 3 only).
a, *b, c = [1, 2, 3, 4, 5, 6]
print(a)
print(b)
print(c)
List comprehensions are also extremely useful, allowing you to program almost as if you were writing a sentence in English.
# get sum of squares of numbers taken from the range 1 to 10
sum(i**2 for i in range(11))
Zipping lists is another one of my favorite features.
a = ['a', 'b', 'c']
b = [1, 2, 3]
c = zip(a, b)
print(list(c)) # need to cast into a list because a zip object is a generator
Note that this requires running from a Python 3 instance of Jupyter (in my case, at least).
In theory, you should just be able to run this line and be all set, but it didn't work for me: conda install -c r r-essentials
If that didn't work, go through these steps:
install.packages('devtools')
devtools::install_github('IRkernel/IRkernel')
IRkernel::installspec() # to register the kernel in the current R installation
install.packages('ggplot2', dependencies=TRUE)pip install rpy2 from your command line/terminalpip install rpy2‑2.8.6‑cp36‑cp36m‑win_amd64.whl or whatever your .whl file is called from within the directory that has the file.First, make some example data in Python.
import pandas as pd
df = pd.DataFrame({'Letter': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],
'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
Load extension allowing one to run R code from within a Python notebook.
%load_ext rpy2.ipython
Do stuff in R with cell or line magics. "-i" imports to R, "-o" outputs from R back to Python.
%%R
install.packages("ggplot2", dep=TRUE)
install.packages("tidyr", dep=TRUE)
install.packages("dplyr", dep=TRUE)
%%R -i df
library("ggplot2")
ggplot(data = df) + geom_point(aes(x = X, y = Y, color = Letter, size = Z))
pip install matlab_kernel
pip install pymatbridge
If you're getting a "zmq channel closed" error, open jupyter notebook from a different port when using MATLAB
jupyter notebook --port=8889
Load MATLAB extension for running MATLAB code within a Python notebook.
%load_ext pymatbridge
Do MATLAB things with line or cell magics.
%%matlab
a = linspace(0.01,6*pi,100);
plot(sin(a))
grid on
hold on
plot(cos(a),'r')
Exit MATLAB when done.
%unload_ext pymatbridge
Note that Javascript executes as the notebook is opened, even if it's been exported as HTML!
%%javascript
console.log('hey!')